语义角色标记深度模型
论文:Deep Semantic Role Labeling: What Works and What’s Next
训练数据:CoNLL 2003
代码:Deep SRL
模型结构
相比较于CNN-BiLSTM-CRF模型,deep-srl简单多了,但是效果并没有打折扣
模型主要改良LSTM,加入Recurrent Dropout和Highway
LSTM公式:
在LSTM基础上加入Highway:
在lstm cell的输出后连接Recurrent Dropout
最后接上全连接层输出每个tag的概率
结果
标签
{'<pad>': 0, 'B-MISC': 1, 'B-ORG': 2, 'I-MISC': 3, 'I-LOC': 4, 'I-PER': 5, 'B-LOC': 6, 'O': 7, 'I-ORG': 8}
结果1
样本: neuchatel @ st gallen @
预测: I-ORG O I-ORG I-ORG O
标注: I-ORG O I-ORG I-ORG O
结果2
样本: kankkunen has set an astonishing pace for a driver who has not rallied for three months .
预测: I-PER O O O I-PER O O O O O O O O O O O O
标注: I-PER O O O O O O O O O O O O O O O O
训练结果
代码
lstm改进操作
- 多两个门,分别对应Highway和上一层输出的线性转换
- Recurrent dropout使用伯努利分布
- 每一层lstm的输出reverse作为下一层的输入
class HwLSTMCell(nn.Module):
def __init__(self, isz, hsz, dropout_prob, is_cuda):
super().__init__()
self.hsz = hsz
self.w_ih = nn.Parameter(torch.Tensor(6 * hsz, isz))
self.w_hh = nn.Parameter(torch.Tensor(5 * hsz, hsz))
self.b_ih = nn.Parameter(torch.Tensor(6 * hsz))
self.rdropout = RnnDropout(dropout_prob, hsz, is_cuda)
self.reset_parameters()
def reset_parameters(self):
stdv = 1.0 / math.sqrt(self.hsz)
for weight in self.parameters():
nn.init.uniform_(weight, -stdv, stdv)
def forward(self, input, hidden=None):
if hidden is None:
hidden = input.new_zeros(input.size(0), self.hsz)
hidden = (hidden, hidden)
hx, cx = hidden
input = F.linear(input, self.w_ih, self.b_ih)
gates = F.linear(hx, self.w_hh) + input[..., :-self.hsz]
in_gate, forget_gate, cell_gate, out_gate, r_gate = gates.chunk(5, 1)
in_gate, forget_gate, out_gate, r_gate = map(
torch.sigmoid, [in_gate, forget_gate, out_gate, r_gate])
cell_gate = torch.tanh(cell_gate)
k = input[..., -self.hsz:]
cy = forget_gate * cx + in_gate * cell_gate
hy = r_gate * out_gate * F.tanh(cy) + (1. - r_gate) * k
if self.training:
hy = self.rdropout(hy)
return hy, cy
class HwLSTMlayer(nn.Module):
def __init__(self, isz, hsz, dropout_prob, is_cuda):
super().__init__()
self.cell = HwLSTMCell(isz, hsz, dropout_prob, is_cuda)
def forward(self, input, reverse=True):
output, hidden = [], None
for i in range(len(input)):
hidden = self.cell(input[i], hidden)
output.append(hidden[0])
if reverse:
output.reverse()
return torch.stack(output)